Computer-implemented method, device, and computer program product

ABSTRACT

Embodiments of the present disclosure relate to a computer-implemented method, a device, and a computer program product. The method includes extracting respective themes of a set of documents with release time within a first period; determining respective semantic information of the themes and frequencies of the themes appearing in the set of documents; and determining the number of documents associated with the themes within a second period according to a prediction model and based on the semantic information and frequencies of the themes. The second period is after the first period. Embodiments of the present disclosure can better predict the tendency of the themes appearing in the future based on the semantic information and frequencies of the themes.

RELATED APPLICATION(S)

The present application claims priority to Chinese Patent ApplicationNo. 202111229407.8, filed Oct. 21, 2021, and entitled“Computer-Implemented Method, Device, and Computer Program Product,”which is incorporated by reference herein in its entirety.

FIELD

Embodiments of the present disclosure relate generally to the field ofcomputers and specifically to a computer-implemented method, a device,and a computer program product.

BACKGROUND

With the emergence and wide application of various technologies such asbig data, Internet of Things, and artificial intelligence, massiveamounts of data in different fields are generated. From such massivedata, various types of knowledge can be efficiently and transparentlyobtained and integrated, and the future development direction of scienceand technology can be predicted. For example, we can predict futureinterest on various themes, such as popular technological themes, basedon the massive data. However, the accuracy of predicting future trendsof interest on different themes needs to be further improved.

SUMMARY

Embodiments of the present disclosure provide a computer-implementedmethod, a device, and a computer program product.

In a first aspect of the present disclosure, a computer-implementedmethod is provided. The method includes extracting respective themes ofa set of documents with release time within a first period; determiningrespective semantic information of the themes and frequencies of thethemes appearing in the set of documents; and determining the number ofdocuments associated with the themes within a second period according toa prediction model and based on the semantic information and frequenciesof the themes, wherein the second time period is after the first period.

In a second aspect of the present disclosure, an electronic device isprovided. The electronic device includes at least one processing unitand at least one memory. The at least one memory is coupled to the atleast one processing unit and stores instructions for execution by theat least one processing unit. The instructions, when executed by the atleast one processing unit, cause the electronic device to performactions including extracting respective themes of a set of documentswith release time within a first period; determining respective semanticinformation of the themes and frequencies of the themes appearing in theset of documents; and determining the number of documents associatedwith the themes within a second period according to a prediction modeland based on the semantic information and frequencies of the themes,wherein the second time period is after the first period.

In a third aspect of the present disclosure, a computer program productis provided. The computer program product is tangibly stored in anon-transitory computer storage medium and includes machine-executableinstructions. The machine-executable instructions, when executed by adevice, cause the device to execute any step of the method according tothe first aspect of the present disclosure.

This Summary is provided to introduce the selection of concepts in asimplified form, which will be further described in the DetailedDescription below. The Summary is neither intended to identify keyfeatures or essential features of the present disclosure, nor intendedto limit the scope of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objectives, features, and advantages of the presentdisclosure will become more apparent from the following description ofexample embodiments of the present disclosure, provided in detail withreference to the accompanying drawings, and in the example embodimentsof the present disclosure, the same reference numerals generallyrepresent the same components.

FIG. 1 illustrates a schematic block diagram of an example environmentin which some embodiments according to the present disclosure can beimplemented;

FIG. 2 illustrates a schematic diagram of a prediction model accordingto some embodiments of the present disclosure;

FIG. 3 illustrates a flow chart of an example method for theme tendencyprediction according to some embodiments of the present disclosure;

FIG. 4 illustrates an example block diagram of theme extractionaccording to some embodiments of the present disclosure;

FIG. 5 illustrates a schematic diagram of a theme extraction resultaccording to some embodiments of the present disclosure;

FIG. 6 illustrates a schematic diagram of example output results oftheme tendency prediction according to some embodiments of the presentdisclosure versus output results of a conventional solution;

FIG. 7 illustrates a schematic diagram of another output result of themetendency prediction according to some embodiments of the presentdisclosure; and

FIG. 8 illustrates a schematic block diagram of an example device forimplementing embodiments of the present disclosure.

Identical or corresponding numerals represent identical or correspondingparts in various accompanying drawings.

DETAILED DESCRIPTION

Example embodiments of the present disclosure will be described in moredetail below with reference to the accompanying drawings. Althoughexample embodiments of the present disclosure are illustrated in theaccompanying drawings, it should be understood that the presentdisclosure may be implemented in various forms and should not be limitedby the embodiments illustrated herein. Rather, these embodiments areprovided to make the present disclosure more thorough and complete andto fully convey the scope of the present disclosure to those skilled inthe art.

The term “include” and variants thereof used herein indicate open-endedinclusion, that is, “including but not limited to.” Unless otherwisestated, the term “or” means “and/or.” The term “based on” denotes “atleast partially based on.” The terms “an example embodiment” and “anembodiment” denote “at least one example embodiment.” The term “anotherembodiment” means “at least one further embodiment.” The terms “first,”“second,” and the like may refer to different or the same objects. Thefollowing may also include other explicit and implicit definitions.

In embodiments of the present disclosure, the term “model” is capable ofprocessing inputs and providing corresponding outputs. A neural networkmodel, for example, typically includes an input layer, an output layer,and one or more hidden layers between the input layer and the outputlayer. A model used in deep learning applications (also referred to as a“deep learning model”) usually includes many hidden layers, therebyincreasing the depth of a network. All the layers of a neural networkmodel are connected in sequence, so that an output of the previous layeris provided as an input to a next layer, wherein the input layerreceives an input to the neural network model, and an output from theoutput layer is used as a final output of the neural network model. Eachlayer of the neural network model includes one or more nodes (alsocalled processing nodes or neurons), each of which processes the inputfrom the previous layer. Herein, the terms “neural network,” “model,”“network,” and “neural network model” can be used interchangeably.

As described above, some solutions have been proposed to predict thefuture interest on various themes, such as popular technological themes,based on massive data. In some conventional solutions, the frequencythat various themes will be followed in a future period of time isusually predicted according to the being-followed frequency of thevarious themes in the past period of time. For example, through thefrequency of a certain theme appearing in papers released in the pastperiod of time, the number of papers related to the theme that will bereleased in a future period of time is predicted.

However, this conventional theme tendency prediction method onlyconsiders the being-followed frequency of the theme in the past periodof time, without considering other factors affecting the theme tendency.For example, some themes have certain correlation with each other. It ispossible that the future change tendency of a certain theme affects thefuture change tendency of other themes related to it. This conventionaltheme tendency prediction method does not consider the mutual influencebetween such themes. Therefore, the accuracy of theme tendencyprediction of this conventional solution is to be further improved.

Embodiments of the present disclosure provide a computer-implementedmethod to solve one or more of the above problems and/or other potentialproblems. In this solution, respective themes of a set of documents withrelease time within a past period of time are extracted. The solutionfurther includes determining respective semantic information of thedifferent themes and frequencies of the themes appearing in the set ofdocuments. The solution further includes determining the number ofdocuments associated with the themes within a future period of timeaccording to a prediction model and based on the semantic informationand frequencies of the themes.

In this way, not only can the frequencies of the themes appearing in thepast period of time be considered, but also semantic meanings of thethemes themselves are further considered. By considering both thefrequencies and the semantic meanings, the interest tendency of thethemes in the future can be better predicted. Especially the futuretendency of a plurality of themes with associated semantic meanings canbe better predicted. Thus, the obtained theme future tendency predictionis of higher significance.

The fundamental principles and several example embodiments of thepresent disclosure will be described in detail below with reference tothe accompanying drawings.

FIG. 1 illustrates a schematic diagram of environment 100 for predictingtheme future tendency according to some embodiments of the presentdisclosure. It should be understood that the numbers and arrangement ofentities, elements, and modules illustrated in FIG. 1 are examples only,and different numbers and different arrangements of entities, elements,and modules can be included in environment 100.

In environment 100 of FIG. 1 , computing system 120 extracts respectivethemes of a set of documents 110. The issue time or release time of eachdocument in the set of documents 110 is within a first period. Forexample, the documents in the set of documents 110 may be various typesof documents such as journal articles, news reports, and professionalbooks. Computing system 120 may adopt a preset algorithm or apre-trained theme extraction model or the like to extract the respectivethemes of the set of documents 110. For example, the respective themesof the set of documents 110 may be extracted by using a model based oncomputer science ontology (CSO). An example theme extraction processwill be introduced in more detail below.

Computing system 120 determines respective semantic information 140 ofthemes 130 and frequencies 150 of the themes appearing in the set ofdocuments 110. Herein, the frequencies 150 of themes 130 appearing inthe set of documents 110 represent the number of documents related tothemes 130 in the set of documents 110. Computing system 120 may adopt apreset algorithm or a pre-trained semantic determining model or the liketo determine the semantic information 140 of the themes 130. Similarly,computing system 120 may adopt a preset algorithm or a pre-trainedfrequency determining model or the like to determine the frequencies 150of the themes 130. An example semantic information determining andfrequency determining process will be introduced in more detail below.

As shown in FIG. 1 , computing system 120 determines number 170 ofdocuments associated with themes 130 within a second period according toprediction model 160 and based on semantic information 140 andfrequencies 150 of themes 130. The second period is a period of timeafter the first period. For example, the second period is a futureperiod of time. Number 170 can reflect the interest tendency of themes130 in the future.

In some embodiments, prediction model 160 may be a pre-trained neuralnetwork model. FIG. 2 illustrates a schematic diagram of exampleprediction model 160 according to some embodiments of the presentdisclosure. As shown in FIG. 2 , prediction model 160 may be along-short term memory network LSTM. In the example of FIG. 2 ,prediction model 160 includes input gate 230, output gate 240, andforgetting gate 250. Input X_(t) 210 received by prediction model 160 isinput to input gate 230, output gate 240, and forgetting gate 250.After, for example, an internal computing logic (i.e., including aplurality of multiplication gates, addition gates, and the like) asshown in FIG. 2 , a result corresponding to input X_(t) 210 may beobtained, that is, output 220. Output 220 may be the predicted number orfrequencies corresponding to input X_(t) 210.

By adopting prediction model 160 such as the LSTM, the vanishinggradient problem occurring in a recurrent neural network (RNN) can besolved. This is implemented by the plurality of multiplication gatesinside the LSTM forcedly executing a constant error stream in aninternal state of a special unit referred to as “memory unit.” By usingexample prediction model 160 as shown in FIG. 2 , memory content may beprevented from being interfered with by irrelevant input and outputthrough input gate 230, output gate 240, and forgetting gate 250,thereby achieving long-term memory storage. In addition, due tolong-term correlation capability in this learning sequence of the LSTM,the LSTM does not require a pre-specified time window or accuratemodeling on a complex multi-variate sequence. Therefore, by adoptingprediction model 160 such as the LSTM, it can better predict the themetendency.

It should be understood that prediction model 160 may also use othermachine learning models. Example prediction model 160 shown in FIG. 2 isonly for the purpose of illustration, and the scope of the presentdisclosure is not limited in this regard.

Example environment 100 according to some embodiments of the presentdisclosure is described above in combination with FIG. 1 . A flow chartof method 300 for theme tendency prediction according to someembodiments of the present disclosure will be described below withreference to FIG. 3 . Method 300 can be implemented by computing system120 of FIG. 1 . It should be understood that method 300 may also beexecuted by other suitable devices or apparatuses. The method 300 mayinclude additional actions not shown and/or may omit actions shown, andthe scope of the present disclosure is not limited in this regard. Forease of description, the method 300 will be described with reference toFIG. 1 .

As shown in FIG. 3 , at 310, computing system 120 extracts respectivethemes of the set of documents 110 with release time within a firstperiod. For example, computing system 120 may extract documents withrelease time within a first period as the set of documents 110 from acertain database (e.g., Springer, CNKI etc.) or other online resources(e.g., Google, etc.). The first period may be a period of time that is,for example, one week, one month, one quarter or other time spans fromthe current time.

In some embodiments, computing system 120 may extract all the documentswith the release time within the first period as the set of documents110. Additionally or alternatively, computing system 120 may extract apredefined number of documents that are accessed most in all thedocuments with the release time within the first period as the set ofdocuments 110. The predefined number may be, for example, 1000, 10000,or any other proper number. It should be understood that the set ofdocuments 110 may also be obtained using other methods.

Computing system 120 may use a theme classifying model to extract therespective themes of the set of documents 110. For example, the themeclassifying model may be a pre-trained CSO theme classifying model. FIG.4 illustrates example block diagram 400 of theme extraction using themeclassifying model 410 according to some embodiments of the presentdisclosure.

As shown in FIG. 4 , theme classifying model 410 may include syntaxmodule 420, semantic module 430, and postprocessing module 440. Themeclassifying model 410 may receive the set of documents 110 and extractthe respective themes of the set of documents 110 respectively. A themeextraction process of theme classifying model 410 is described belowtaking document 401 in the set of documents 110 as an example.

In some embodiments, syntax module 420 may divide content of document401 into a plurality of segments (N-Grams). In some embodiments, syntaxmodule 420 may divide a title, an abstract, and keywords of document 401into a plurality of segments. Alternatively, syntax module 420 may alsodivide other content, such as content of a summary part at the end, ofdocument 401 into a plurality of segments. This solution is not limitedin this regard. Syntax module 420 further determines similaritiesbetween the segments and the different themes (or concepts).

In some embodiments, semantic module 430 may be a module based on CSOand word marking or embedding. Semantic module 430 may extract entitiesfrom the plurality of segments divided from document 401. Herein, theentities may refer to words or phrases associated with a certain theme.For example, the entities may be phrases “image recognition,”“three-dimensional (3D) reconstruction,” and the like associated withthe theme “neural network.”

Additionally or alternatively, semantic module 430 may also performtheme identification on the extracted entities. For example, if theextracted entity is “3D reconstruction,” semantic module 430 mayidentify the entity as being associated with “neural network.”

In some embodiments, semantic module 430 may sort the identified themesin document 401 according to the number or frequency of identificationof the themes. Semantic module 430 may select a predefined number (e.g.,5) of top-ranked themes from the sorting as the themes of document 401.It should be understood that the number 5 listed herein is only anexample, and the predefined number may be any proper number.

As shown in FIG. 4 , theme classifying model 410 may further includepostprocessing module 440. Postprocessing module 440 may combine thesegments output by syntax module 420 with the themes output by semanticmodule 430. For example, postprocessing module 440 may combine thesegments with the themes according to the similarities between thesegments and the themes determined by syntax module 420. In someembodiments, postprocessing module 440 adopts a form such as a knowledgegraph to output the combined segments and themes.

Additionally or alternatively, in some embodiments, postprocessingmodule 440 is configured to filter output results. For example,postprocessing module 440 may determine the similarity between thepredefined number of themes for document 401 output by semantic module430. If it is determined that the similarity between a certain theme inthe predefined number of themes and other themes is lower than asimilarity threshold value (e.g., the similarity may be a value between0 and 1), the theme may be removed.

In some embodiments, postprocessing module 440 may further enhance theoutput results. For example, if postprocessing module 440 determinesthat the output themes have a first theme and a second theme which maybe included in the first theme and more refined, postprocessing module440 may remove the first theme from the output results. For example, ifpostprocessing module 440 determines that the output results include atheme “artificial intelligence” and a theme “recurrent neural network,”postprocessing module 440 may only retain the theme “recurrent neuralnetwork” and remove the theme “artificial intelligence.”

The process of performing theme extraction on document 401 in the set ofdocuments 110 is described above in combination with FIG. 4 . It shouldbe understood that theme extraction may also be performed on otherdocuments in the set of documents 110 in a similar way.

By using theme classifying model 410 described in combination with FIG.4 above, the respective themes of the set of documents can be extractedmore accurately. It should be understood that theme classifying model410 described in FIG. 4 is only for the purpose of illustration, and isnot restrictive. Computing system 120 may adopt other proper algorithmsor a pre-trained machine learning model or the like to extract therespective themes of the set of documents 110.

FIG. 5 illustrates a schematic diagram of theme extraction result 500according to some embodiments of the present disclosure. Themeextraction result 500 shown in FIG. 5 may be obtained by themeclassifying model 410 described with reference to FIG. 4 . Themeextraction result 500 in FIG. 5 may also be obtained by using othertheme extraction methods. As shown in FIG. 5 , document information 510includes document information of an example document in the set ofdocuments 110. Document information 510 includes a title, an abstract,and keywords of the document. It should be understood that computingsystem 120 may also receive document information of other documents inthe set of documents 110.

Example result 520 in FIG. 5 has a plurality of output results. Forexample, a result included in field “union” 530 includes a themeextracted according to document information 510. A result included infield “enhanced” 540 includes a filtered and enhanced theme obtainedfrom the result included in field “union” 530 through postprocessingmodule 440.

In the example of FIG. 5 , other additional output results are furthershown. For example, content included in a field “syntactic” may be asegment obtained by syntax module 420. Content included in a field“semantic” may be a theme identified by semantic module 430. Contentincluded in a field “explanation” may be information obtained bycombining the theme and a segment corresponding to the theme. Inaddition, FIG. 5 further illustrates output result 550 corresponding toanother document. Output result 550 is similar to result 520.

It should be understood that theme extraction result 500 shown in FIG. 5is only an example, and not restrictive. In some embodiments, the outputresult may only include the themes, excluding other additionalinformation in FIG. 5 . In some embodiments, the output result mayinclude more additional information. By adding the other informationbesides the themes into the output result, more information may beprovided for analysis in a subsequent process of presenting results to auser.

Continuing with reference to FIG. 3 , at 320, computing system 120determines respective semantic information 140 of the themes andfrequencies 150 of the themes appearing in the set of documents 110. Insome embodiments, computing system 120 may determine the respectivesemantic information and frequency of each theme extracted from eachdocument in the set of documents 110. In some embodiments, computingsystem 120 may also select some themes (e.g., one or more themes ofinterest) from the themes extracted from each document in the set ofdocuments 110 to determine the respective semantic information andfrequencies of the themes.

In some embodiments, computing system 120 may determine the semanticinformation of the themes by determining a time sequence of semanticrepresentations of the themes within the first period. For example, thethemes at time interval points within the first period may be encoded tothe time sequence of semantic representations. Semantic representationsat the time interval points in the time sequence of semanticrepresentations are configured to represent semantic meanings of thethemes, and may also be regarded as word embedding.

Taking simple statements “Have a good day” and “Have a great day” as anexample, the dimension of a word set {“have,” “a,” “good,” “great,”“day”} contained in the above two statements is 5. Therefore, each wordmay be encoded respectively by using vectors of 5 dimensions. Forexample, the word “have” may be encoded as [1, 0, 0, 0, 0]. The word “a”may be encoded as [0, 1, 0, 0, 0]. The word “good” may be encoded as [0,0, 1, 0, 0]. The word “great” may be encoded as [0, 0, 0, 1, 0]. Theword “day” may be encoded as [0, 0, 0, 0, 1]. The themes at the timeinterval points within the first period may be encoded in a similar wayto obtain the time sequence of semantic representations.

In some embodiments, the time sequence of semantic representations maybe determined by adopting a pre-trained semantic encoding model. Thepre-trained semantic encoding model may be implemented by using anysuitable type of neural network. For example, the pre-trained semanticencoding model may be implemented by using a transformer or abidirectional encoder representations from transformers (BERT). Itshould be understood that the semantic encoding model adopted herein mayadapt to various languages, such as English, Chinese, etc.

In some embodiments, for the time interval point within the firstperiod, the pre-trained semantic encoding model determines the semanticrepresentation of the time sequence of semantic representations at thetime interval point according to the semantic encoding model and basedon words or words in phrases corresponding to a certain theme in thedocument with the release time not later than the time interval point inthe set of documents 110.

In some embodiments, a classification token ([cls] token) correspondingto the theme determined by the pre-trained semantic encoding model maybe used as the semantic representation of the theme. In this way, thetheme with different numbers of words may be represented by a singleclassification token with the same dimension. In this way, subsequenttheme tendency prediction and other tasks can be conveniently performed.

The time interval points within the first period may have the same ordifferent time intervals. For example, taking the first period being aweek as an example, the first period may have seven time interval pointswith one day as an interval. The pre-trained semantic encoding model maydetermine semantic information of the theme at each time interval pointwithin the period. It should be understood that the span of the firstperiod and the interval of the time interval points listed in thepresent disclosure are only examples, and not restrictive.

The semantic information of the themes can be accurately extracted byobtaining the semantic representations of the themes through thepre-trained semantic encoding model such as the BERT. In addition, byproperly encoding the semantic information, subsequent different taskssuch as theme tendency prediction can be conveniently performed.

In some embodiments, computing system 120 may determine the frequenciesof the themes by determining a time sequence of frequencyrepresentations of the themes within the first period. For example, fora time interval point within the first period, computing system 120 maydetermine a frequency representation of the time sequence of frequencyrepresentations at the time interval point based on the number ofdocuments corresponding to the themes in documents with release time notlater than the time interval point in the set of documents 110.

Additionally or alternatively, computing system 120 may also use aposition extension code to determine the frequency representation of thetime sequence of frequency representations of the themes at the timeinterval points. The position extension code may be, for example, anextension code of a cosine probability function. An example positioncode is shown as follows:

$\begin{matrix}{D_{it} = \left\{ \begin{matrix}{{{{PE}\left( {t,k} \right)} = {\sin\left( {t \times \omega_{n}} \right)}},} & {{{if}k} = {2n}} \\{{{{PE}\left( {t,k} \right)} = {\cos\left( {t \times \omega_{n}} \right)}},} & {{{if}k} = {{2n} + 1}}\end{matrix} \right.} & (1)\end{matrix}$

In formula (1), t represents a frequency of a theme, and D_(it)represents a frequency representation of the frequency t, k representsthe k-th component of the frequency representation, the frequencyrepresentation has a dimension d, and k≤d.

In some embodiments, ω_(n) in formula (1) may be represented as follows:

$\begin{matrix}{\omega_{n} = \frac{1}{10000^{2{n/d}}}} & (2)\end{matrix}$

By adopting the position extension code mode listed above, theone-dimensional frequency of the theme may be extended into amulti-dimensional frequency representation. An example of one frequencyrepresentation obtained by the code is shown as formula (3) below:

$\begin{matrix}{\overset{\rightarrow}{p_{t}} = \begin{bmatrix}{\sin\left( {\omega_{1} \cdot t} \right)} \\{\cos\left( {\omega_{1} \cdot t} \right)} \\{\sin\left( {\omega_{2} \cdot t} \right)} \\{\cos\left( {\omega_{2} \cdot t} \right)} \\ \vdots \\{\sin\left( {\omega_{d/2} \cdot t} \right)} \\{\cos\left( {\omega_{d/2} \cdot t} \right)}\end{bmatrix}_{d \times 1}} & (3)\end{matrix}$

In some embodiments, the dimension d of the frequency representation maybe a preset number greater than 1. For example, if the semanticrepresentation at each time interval point in the time sequence ofsemantic representations has a dimension of 728, d may be set as 728dimensions. It should be understood that the number of the dimensionslisted above is only an example. It should be understood that d may beset as other dimensions which are the same, or more or less than thedimension of the semantic representation. The frequency representationmay adopt the form of the vector shown in (3), for example. Thefrequency representation may also adopt other data forms. This solutionis not limited in this regard.

It should be understood that the position extension code method listedabove is only an example, and not restrictive. Other extension codemethods may be adopted to extend the one-dimensional frequency into themulti-dimensional frequency representation.

By making the frequency representation of the theme themulti-dimensional frequency representation, the frequency of the thememay be prevented from being ignored when the theme tendency is predicteddue to the multi-dimensional semantic representation and thesingle-dimensional frequency. In this way, the semantic information andfrequencies of the themes can be considered more sufficiently, withoutmissing the frequency influence during prediction due to the differenceof dimensions.

Continuing with reference to FIG. 3 , at 330, computing system 120determines the number of documents associated with the themes within asecond period according to prediction model 160 and based on semanticinformation 140 and frequencies 150 of the themes. The second period isafter the first period. For example, the second period may be a futureperiod of time after current time. For example, the second period may beone week, one month, or one year in the future.

In some embodiments, prediction model 160 may adopt a pre-trained LSTMmodel, as described with reference to FIG. 2 . In some embodiments,prediction model 160 may also use other proper neural network models.

In some embodiments, computing system 120 may determine a number timesequence of the themes within the second period. The number timesequence includes the number of documents associated with the themes ateach time interval point within the second period. For example,computing system 120 may determine the number time sequence of thethemes within the second period according to prediction model 160 andbased on the time sequence of semantic representations and time sequenceof frequency representations of the themes.

In some embodiments, computing system 120 may determine the number ofdocuments associated with each theme extracted from the set of documents110 within the second period. In some embodiments, computing system 120may only analyze certain themes therein, such as some themes of interestof a user. In other words, computing system 120 may only determine thenumber of documents associated with one or more themes in the themesextracted from the set of documents 110 within the second period.

In this way, during future tendency prediction of the themes, thefrequencies of the themes appearing in the past can be considered, andthe semantic information of the themes is also considered. In this way,the theme tendency may be predicted more accurately. Especially for someassociated themes, analyzing the semantic meanings of these themes canconsider the association between these themes, so that more accurateprediction results are obtained. For example, themes “artificialintelligence” and “machine learning” have similar semantic meanings. Thesimilar predicted tendency of the themes “artificial intelligence” and“machine learning” can be obtained by using theme tendency prediction ofthis solution. In addition, by predicting the number of the documentsassociated with the themes within a future period through the method ofthis solution, the future technology development tendency can beanalyzed and predicted.

FIG. 6 illustrates a schematic diagram of example output result 600 oftheme tendency prediction according to some embodiments of the presentdisclosure versus output result 650 of a conventional solution. Exampleoutput result 600 in FIG. 6 represents prediction of the number ofdocuments associated with the themes within expected time according toinformation from July 24, July 25, and July 26 by using the solution ofthe present disclosure. Example output result 650 represents predictionof the number of documents associated with the themes within theexpected time according to the information from July 24, July 25, andJuly 26 by using the conventional solution.

Result 610 in FIG. 6 represents a result obtained by predicting a theme“point cloud” by using the solution of the present disclosure. Result620 represents a result obtained by predicting a theme “3D human facereconstruction” by using the solution of the present disclosure. Result660 represents a result obtained by predicting the theme “point cloud”by using the conventional solution. Result 670 represents a resultobtained by predicting the theme “3D human face reconstruction” by usingthe conventional solution.

The point cloud is a conventional method for processing 3D images.“Point cloud” may not be mentioned in titles, abstracts, keywords, andother key parts in some documents about the theme “3D human facereconstruction.” However, it is possible that these documents areactually associated with “point cloud.” As shown in FIG. 6 , by usingthe conventional solution, what may only be predicted is that the numberof the documents associated with the theme “3D human facereconstruction” is increased, as indicated by prediction result 670.However, prediction result 660 of the conventional solution representsthat the number of the documents associated with the theme “point cloud”is not increased.

Compared to this, by using the solution of the present disclosure, itcan be considered that the theme “point cloud” and the theme “3D humanface reconstruction” have a certain semantic connection. Thus, result610 and result 620 predict that the number of documents associated withthe theme “point cloud” and the number of documents associated with thetheme “3D human face reconstruction” are both increased.

It can be seen by comparing output result 600 and output result 650 thatthe theme prediction result obtained by using the solution of thepresent disclosure can consider the semantic information of the themes,and therefore the future tendency of the themes can be better predicted.

It should be understood that the results shown in FIG. 6 are onlyillustrative. FIG. 6 only shows the future tendency prediction of twothemes. In some embodiments, fewer or more themes may be predicted. Inaddition, theme tendency prediction within shorter or longer future timemay also be performed by using more or better document information inthe past.

Through testing, theme tendency prediction using the conventionalsolution has a loss rate of about 0.1% in 2000 rounds. Compared to this,theme tendency prediction using the present solution has a loss rate ofabout 0.0625% in 2000 rounds. The theme tendency prediction using thepresent solution therefore has better performance in terms of loss rate.

FIG. 7 illustrates a schematic diagram of another output result 700 oftheme tendency prediction according to some embodiments of the presentdisclosure. By using the solution of the present disclosure, theinterest condition of various different themes at a certain time pointmay also be predicted. For example, as shown in FIG. 7 , different grayblocks may represent different themes. The areas of the different blocksmay represent the number of issued documents associated with the themes.For example, block 710 represents that the theme corresponding to thatblock has the most interest at that time point.

It should be understood that the theme tendency prediction methodaccording to the embodiment of the present disclosure may also provideother prediction information besides that illustrated in FIG. 6 and FIG.7 . By using the present solution, the future science and technologydevelopment direction and the like can be predicted and analyzed.

FIG. 8 shows a schematic block diagram of example device 800 that may beconfigured to implement embodiments of the present disclosure. Forexample, computing system 120 as shown in FIG. 1 can be implemented bydevice 800. As shown in FIG. 8 , device 800 includes central processingunit (CPU) 801 that can perform various appropriate actions andprocessing according to computer program instructions stored inread-only memory (ROM) 802 or computer program instructions loaded fromstorage unit 808 to random access memory (RAM) 803. Various programs anddata required for the operation of device 800 may also be stored in RAM803. CPU 801, ROM 802, and RAM 803 are connected to each other throughbus 804. Input/output (I/O) interface 805 is also connected to bus 804.

A plurality of components in device 800 are connected to I/O interface805, including: input unit 806, such as a keyboard and a mouse; outputunit 807, such as various types of displays and speakers; storage unit808, such as a magnetic disk and an optical disc; and communication unit809, such as a network card, a modem, and a wireless communicationtransceiver. In some embodiments, input samples can be input to device800 via input unit 806. Communication unit 809 allows device 800 toexchange information/data with other devices via a computer network,such as the Internet, and/or various telecommunication networks.

The various processes and processing procedures described above, such asmethod 300, may be performed by CPU 801. For example, in someembodiments, method 300 may be implemented as a computer softwareprogram that is tangibly included in a machine-readable medium such asstorage unit 808. In some embodiments, part or all of the computerprogram may be loaded and/or installed onto device 800 via ROM 802and/or communication unit 809. When the computer program is loaded intoRAM 803 and executed by CPU 801, one or more actions of method 300described above may be implemented.

Illustrative embodiments of the present disclosure include a method, anapparatus, a system, and/or a computer program product. The computerprogram product may include a computer-readable storage medium on whichcomputer-readable program instructions for performing various aspects ofthe present disclosure are loaded.

The computer-readable storage medium may be a tangible device that mayhold and store instructions used by an instruction-executing device. Forexample, the computer-readable storage medium may be, but is not limitedto, an electric storage device, a magnetic storage device, an opticalstorage device, an electromagnetic storage device, a semiconductorstorage device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer-readablestorage medium include: a portable computer disk, a hard disk, a RAM, aROM, an erasable programmable read-only memory (EPROM or flash memory),a static random access memory (SRAM), a portable compact disc read-onlymemory (CD-ROM), a digital versatile disc (DVD), a memory stick, afloppy disk, a mechanical encoding device, for example, a punch card ora raised structure in a groove with instructions stored thereon, and anyappropriate combination of the foregoing. The computer-readable storagemedium used herein is not to be interpreted as transient signals per se,such as radio waves or other freely propagating electromagnetic waves,electromagnetic waves propagating through waveguides or othertransmission media (e.g., light pulses through fiber-optic cables), orelectrical signals transmitted through electrical wires. Thecomputer-readable program instructions described herein may bedownloaded from a computer-readable storage medium to variouscomputing/processing devices or downloaded to an external computer orexternal storage device via a network, such as the Internet, a localarea network, a wide area network, and/or a wireless network. Thenetwork may include copper transmission cables, fiber optictransmission, wireless transmission, routers, firewalls, switches,gateway computers, and/or edge servers. A network adapter card ornetwork interface in each computing/processing device receivescomputer-readable program instructions from a network and forwards thecomputer-readable program instructions for storage in acomputer-readable storage medium in the computing/processing device.

The computer program instructions for executing the operation of thepresent disclosure may be assembly instructions, instruction setarchitecture (ISA) instructions, machine instructions, machine-dependentinstructions, microcode, firmware instructions, status setting data, orsource code or object code written in any combination of one or moreprogramming languages, the programming languages includingobject-oriented programming language such as Smalltalk and C++, andconventional procedural programming languages such as the C language orsimilar programming languages. The computer-readable programinstructions may be executed entirely on a user computer, partly on auser computer, as a stand-alone software package, partly on a usercomputer and partly on a remote computer, or entirely on a remotecomputer or a server. In a case where a remote computer is involved, theremote computer may be connected to a user computer through any kind ofnetworks, including a local area network (LAN) or a wide area network(WAN), or may be connected to an external computer (for example,connected through the Internet using an Internet service provider). Insome embodiments, an electronic circuit, such as a programmable logiccircuit, a field programmable gate array (FPGA), or a programmable logicarray (PLA), is customized by utilizing status information of thecomputer-readable program instructions. The electronic circuit mayexecute the computer-readable program instructions to implement variousaspects of the present disclosure.

Various aspects of the present disclosure are described herein withreference to flow charts and/or block diagrams of the method, theapparatus (system), and the computer program product implementedaccording to embodiments of the present disclosure. It should beunderstood that each block of the flow charts and/or the block diagramsand combinations of blocks in the flow charts and/or the block diagramsmay be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to aprocessing unit of a general-purpose computer, a special-purposecomputer, or a further programmable data processing apparatus, therebyproducing a machine, such that these instructions, when executed by theprocessing unit of the computer or the further programmable dataprocessing apparatus, produce means for implementing functions/actionsspecified in one or more blocks in the flow charts and/or blockdiagrams. These computer-readable program instructions may also bestored in a computer-readable storage medium, and these instructionscause a computer, a programmable data processing apparatus, and/or otherdevices to operate in a specific manner; and thus the computer-readablemedium having instructions stored includes an article of manufacturethat includes instructions that implement various aspects of thefunctions/actions specified in one or more blocks in the flow chartsand/or block diagrams.

The computer-readable program instructions may also be loaded to acomputer, a further programmable data processing apparatus, or a furtherdevice, so that a series of operating steps may be performed on thecomputer, the further programmable data processing apparatus, or thefurther device to produce a computer-implemented process, such that theinstructions executed on the computer, the further programmable dataprocessing apparatus, or the further device may implement thefunctions/actions specified in one or more blocks in the flow chartsand/or block diagrams.

The flow charts and block diagrams in the drawings illustrate thearchitectures, functions, and operations of possible implementations ofthe systems, methods, and computer program products according to variousembodiments of the present disclosure. In this regard, each block in theflow charts or block diagrams may represent a module, a program segment,or part of an instruction, the module, program segment, or part of aninstruction including one or more executable instructions forimplementing specified logical functions. In some alternativeimplementations, functions marked in the blocks may also occur in anorder different from that marked in the accompanying drawings. Forexample, two successive blocks may actually be executed in parallelsubstantially, and sometimes they may also be executed in an inverseorder, which depends on involved functions. It should be further notedthat each block in the block diagrams and/or flow charts as well as acombination of blocks in the block diagrams and/or flow charts may beimplemented by using a special hardware-based system that executesspecified functions or actions, or implemented using a combination ofspecial hardware and computer instructions.

Illustrative embodiments of the present disclosure have been describedabove. The above description is illustrative, rather than exhaustive,and is not limited to the disclosed various embodiments. Numerousmodifications and alterations will be apparent to those of ordinaryskill in the art without departing from the scope and spirit of theillustrated embodiments. The selection of terms used herein is intendedto best explain the principles and practical applications of the variousembodiments or the improvements to technologies on the market, so as toenable persons of ordinary skill in the art to understand theembodiments disclosed herein.

What is claimed is:
 1. A computer-implemented method, comprising:extracting respective themes of a set of documents with release timewithin a first period; determining respective semantic information ofthe themes and frequencies of the themes appearing in the set ofdocuments; and determining the number of documents associated with thethemes within a second period according to a prediction model and basedon the semantic information and frequencies of the themes, wherein thesecond period is after the first period.
 2. The method according toclaim 1, wherein determining the semantic information comprises:determining a time sequence of semantic representations of the themeswithin the first period.
 3. The method according to claim 2, whereindetermining the time sequence of semantic representations comprises: fora time interval point within the first period, determining a semanticrepresentation of the time sequence of semantic representations at thetime interval point according to a semantic encoding model and based onwords or words in phrases corresponding to the themes in documents withrelease time not later than the time interval point in the set ofdocuments.
 4. The method according to claim 1, wherein determining thefrequencies comprises determining a time sequence of frequencyrepresentations of the themes within the first period.
 5. The methodaccording to claim 4, wherein determining the time sequence of frequencyrepresentations comprises: for a time interval point within the firstperiod, determining a frequency representation of the time sequence offrequency representations at the time interval point based on the numberof documents corresponding to the themes in documents with release timenot later than the time interval point in the set of documents.
 6. Themethod according to claim 5, wherein determining the frequencyrepresentation at the time interval point comprises: determining thefrequency representation by using a position extending code based on thenumber, wherein the frequency representation has a predefined dimensionwhich is greater than one dimension.
 7. The method according to claim 1,wherein extracting the respective themes of the set of documentscomprises: extracting a predefined number of respective themes of theset of documents by using a theme classifying model.
 8. The methodaccording to claim 1, wherein determining the number of the documentsassociated with the themes within the second period comprises:determining a number time sequence of the themes within the secondperiod, wherein the number time sequence comprises the number ofdocuments associated with the themes at each time interval point withinthe second period.
 9. An electronic device, comprising: at least oneprocessor; and at least one memory storing computer programinstructions, wherein the at least one memory and the computer programinstructions are configured to cause, together with the at least oneprocessor, the electronic device to perform actions comprising:extracting respective themes of a set of documents with release timewithin a first period; determining respective semantic information ofthe themes and frequencies of the themes appearing in the set ofdocuments; and determining the number of documents associated with thethemes within a second period according to a prediction model and basedon the semantic information and frequencies of the themes, wherein thesecond period is after the first period.
 10. The electronic deviceaccording to claim 9, wherein determining the semantic informationcomprises: determining a time sequence of semantic representations ofthe themes within the first period.
 11. The electronic device accordingto claim 10, wherein determining the time sequence of semanticrepresentations comprises: for a time interval point within the firstperiod, determining a semantic representation of the time sequence ofsemantic representations at the time interval point according to asemantic encoding model and based on words or words in phrasescorresponding to the themes in documents with release time not laterthan the time interval point in the set of documents.
 12. The electronicdevice according to claim 9, wherein determining the frequenciescomprises determining a time sequence of frequency representations ofthe themes within the first period.
 13. The electronic device accordingto claim 12, wherein determining the time sequence of frequencyrepresentations comprises: for a time interval point within the firstperiod, determining a frequency representation of the time sequence offrequency representations at the time interval point based on the numberof documents corresponding to the themes in documents with release timenot later than the time interval point in the set of documents.
 14. Theelectronic device according to claim 13, wherein determining thefrequency representation at the time interval point comprises:determining the frequency representation by using a position extendingcode based on the number, wherein the frequency representation has apredefined dimension which is greater than one dimension.
 15. Theelectronic device according to claim 9, wherein extracting therespective themes of the set of documents comprises: extracting apredefined number of respective themes of the set of documents by usinga theme classifying model.
 16. The electronic device according to claim9, wherein determining the number of the documents associated with thethemes within the second period comprises: determining a number timesequence of the themes within the second period, wherein the number timesequence comprises the number of documents associated with the themes ateach time interval point within the second period.
 17. A computerprogram product tangibly stored in a non-volatile computer-readablemedium and including machine-executable instructions, wherein themachine-executable instructions, when executed, cause a device toexecute a method, the method comprising: extracting respective themes ofa set of documents with release time within a first period; determiningrespective semantic information of the themes and frequencies of thethemes appearing in the set of documents; and determining the number ofdocuments associated with the themes within a second period according toa prediction model and based on the semantic information and frequenciesof the themes, wherein the second period is after the first period. 18.The computer program product according to claim 17, wherein determiningthe semantic information comprises: determining a time sequence ofsemantic representations of the themes within the first period.
 19. Thecomputer program product according to claim 18, wherein determining thetime sequence of semantic representations comprises: for a time intervalpoint within the first period, determining a semantic representation ofthe time sequence of semantic representations at the time interval pointaccording to a semantic encoding model and based on words or words inphrases corresponding to the themes in documents with release time notlater than the time interval point in the set of documents.
 20. Thecomputer program product according to claim 17, wherein determining thefrequencies comprises determining a time sequence of frequencyrepresentations of the themes within the first period.